Bioinformatics of the Brain (Kayhan Erciyes, Tuba Sevimoğlu)

256

Bioinformatics of the Brain

(https://www.maxquant.org/), Proteome Discoverer (Thermo Scientific) and

Skyline (https://skyline.ms/project/home/software/Skyline/begin.view) are

among the programs frequently used in these studies. Additionally, vari-

ous search engines such as MASCOT (www.matrixscience.com), SEQUEST,

X!Tandem, OMSSA (Open Mass Spectrometry Search Algorithm) and An-

dromeda are used for peptide and protein identifications [20–24]. While raw

data processing, statistical analysis and visualizations can be done in a single

program, they have limited use. However, for further analysis, proteomic data

can be exported to different formats (e.g., mzML mzIdentML and pepXML)

and visualized using various online tools, programs or programming languages

[25, 26]. It is possible to group the figures used in a proteomic study under

several headings for the purpose. The first involves assessing MS data quality.

Quality control of MS data is crucial for early detection of faults in the sam-

ple collection or preparation process and problems such as sample overload,

contamination, and uneven spraying. For this purpose, it would be very useful

to visualize the raw data at the precursor and fragment level and evaluate it

using a reference extracted ion chromatogram and total ion chromatogram

images.

The next step is to look at the distribution of examples and similarities

between groups. The distribution of identified peptides and proteins can be

evaluated by histogram and PCA analyses. Additionally, box-whisker plots

can be used to see differences between sample abundances. In terms of protein

coverage, common or unique proteins between groups can be represented by a

Venn diagram using accession numbers. Heat maps and volcano-plot graphics,

in which proteins whose expression changes are evaluated quantitatively, are

constructed based on fold change and p-values in terms of statistical and

biological significance. Normalization of protein abundances is very important

before drawing such graphs. Rows and columns with similar abundance values

in heat maps generated by calculating the distance between protein amounts

are clustered using Distance Function methods such as Euclidean, Manhattan

and Pearson [25, 27, 28].

In addition, further analyses are required to understand and interpret the

thousands of protein lists obtained by processing raw data and to reveal their

biological significance. Page-long protein lists are translated into more un-

derstandable visuals thanks to the increasing number of enrichment tools.

Online databases such as Panther (https://www.pantherdb.org/), STRING

(https://string-db.org/) and DAVID (https://david.ncifcrf.gov/) are widely

used for gene ontology analyses. In addition, the Cytoscape software platform

(https://apps.cytoscape.org/apps/all_#downloads), which supports a wide

range of plug-ins such as ClueGO, CluePedia, PiNGO, BINGO and Cyto-

Cluster, provides researchers with more options for enrichment analyses and

greater flexibility for figures. Various databases are utilized for functional en-

richment analyses. KEGG (https://www.genome.jp/kegg/) and REACTOME

(https://reactome.org/) are the most frequently referenced databases for